Execution History Guided Instruction Prefetching
نویسندگان
چکیده
منابع مشابه
Branch History Guided Instruction Prefetching
Instruction cache misses stall the fetch stage of the processor pipeline and hence affect instruction supply to the processor. Instruction prefetching has been proposed as a mechanism to reduce instruction cache (I-cache) misses. However, a prefetch is effective only if accurate and initiated sufficiently early to cover the miss penalty. This paper presents a new hardware-based instruction pref...
متن کاملInstruction Cache Prefetching Using Multilevel Branch Prediction
This paper presents an instruction cache prefetching mechanism capable of prefetching past branches in multiple-issue processors. Such processors at high clock rates often use small instruction caches which have significant miss rates. Prefetching from secondary cache can hide the instruction cache miss penalties but only if initiated sufficiently far ahead of the current program counter. Exist...
متن کاملInstruction-Level Execution Migration
We introduce the Execution Migration Machine (EM), a novel data-centric multicore memory system architecture based on computation migration. Unlike traditional distributed memory multicores, which rely on complex cache coherence protocols to move the data to the core where the computation is taking place, our scheme always moves the computation to the core where the data resides. By doing away ...
متن کاملThreaded prefetching: An adaptive instruction prefetch mechanism
We propose and analyze an adaptive instruction prefetch scheme, called threaded prefetching, that makes use of history information to guide the prefetching. The scheme is based on the observation that control ow paths are likely to repeat themselves. In the proposed scheme, we associate with each instruction block a number of threads that indicate the instruction blocks that have been brought i...
متن کاملEffective Instruction Prefetching In Chip Multiprocessors
threaded application performance, often achieved through instruction level parallelism per chip is increasing, the software and hardware techniques to exploit the potential of studies mostly involve distributed shared memory multiprocessors and fetching will not be fully effective at masking the remote fetch latency. the effective address of the load instructions along that path based upon a hi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: The Journal of Supercomputing
سال: 2004
ISSN: 0920-8542
DOI: 10.1023/b:supe.0000009319.31230.a9